Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate the routing for control flows and data flows #1082

Merged
merged 14 commits into from
Feb 12, 2025

Conversation

Yu-Zhewen
Copy link
Contributor

@Yu-Zhewen Yu-Zhewen commented Feb 6, 2025

The routing results for control flows should remain unchanged, no matter whether (or how many data flows) are present. Therefore, the routing pass is now run twice, with the first time to route control flows only, and the second to route data flows. To achieve this,

  • router can now load existing, partially routed packet flows from IR (previously, only circuit flows could be loaded).
  • the above information is passed to the Dijkstra pathfinding algorithm and the arbiter/MSel allocation algorithm, for deterministic behavior.

Note: mlir-aie handled the problem by still running the routing pass only once, and sorting flows before invoking Dijkstra's algorithm (https://github.com/Xilinx/mlir-aie/blob/a60888210c76266cc932d5a0cc922fceef0322f0/lib/Dialect/AIE/Transforms/AIEPathFinder.cpp#L349). However, since Dijkstra is an iterative, congestion-aware algorithm, sorting alone does not fully prevent congestion from data flows affecting control flow routing.

@Yu-Zhewen Yu-Zhewen changed the title [WIP][Do Not Review] Separate the routing for CTRL and non-CTRL [WIP][Do Not Review] Separate the routing for control flows and data flows Feb 11, 2025
@Yu-Zhewen Yu-Zhewen changed the title [WIP][Do Not Review] Separate the routing for control flows and data flows Separate the routing for control flows and data flows Feb 11, 2025
@@ -236,7 +236,8 @@ def AIE_PacketRuleOp: AIE_Op<"rule"> {
let arguments = (
ins I8Attr:$mask,
I8Attr:$value,
Index:$amsel
Index:$amsel,
OptionalAttr<DenseI32ArrayAttr>:$packet_ids
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When lowering packet_flow to packet_rule, the IDs can be used to calculate the mask and value. However, the reverse operation (calculating the IDs from the mask and value) is not feasible, as we will probably get a superset of actual IDs. This can cause issues during the second run of the router.

@@ -108,22 +139,17 @@ PacketRulesOp getOrCreatePacketRules(OpBuilder &builder, SwitchboxOp &swboxOp,
return packetRules;
}

struct ConvertFlowsToInterconnect : OpConversionPattern<FlowOp> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use OpConversionPattern, as it is also conditional based on whether it is control flow or not.

// CHECK: aie.rule(31, 10, %3)
// CHECK: aie.rule(27, 11, %3)
// CHECK: aie.rule(24, 8, %2)
// CHECK: aie.rule(31, 13, %[[VAL_70]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -132,6 +117,23 @@ struct PhysPortAndID {
TUPLE_LIKE_STRUCT_RELATIONAL_OPS(PhysPortAndID)
};

struct RouterImpl;
struct Router {
Copy link
Contributor Author

@Yu-Zhewen Yu-Zhewen Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct Router has been moved after the definition of PhysPort to accommodate the newly introduced addFixedPacketConnection function.

slavePorts, packetFlows, priorSlavePorts, priorPacketFlows,
deviceModel.getPacketIdMaskWidth());

// Erase any duplicate packet rules in exising, previously-routed packet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this come up? Can we somehow avoid it?

Copy link
Contributor Author

@Yu-Zhewen Yu-Zhewen Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example is like this:

During the first router run, only the control flow (id=0) is routed:

aie.packet_rules(SOUTH : 1) {
  aie.rule(31, 0, %[[AMSEL_0]]) {packet_ids = array<i32: 0>}
}

In the second router run, when the data flow (id=1) is routed, the packet_rules are updated to:

aie.packet_rules(SOUTH : 1) {
  aie.rule(30, 0, %[[AMSEL_0]]) {packet_ids = array<i32: 0, 1>}
}

Since the control and data packet flows share the same channel (SOUTH:1) and arbiter (%[[AMSEL_0]]), the mask value is updated from 31 to 30.

A probably "safer" option is to replace individual aie.rule entries, rather than erase the aie.packet_rules and regenerate the whole block, but an "issue" is the orders of rules might also get changed due to #1087.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks, this makes sense now.

@@ -1,79 +1,402 @@
// RUN: iree-opt --amdaie-create-pathfinder-flows %s | FileCheck %s
// RUN: iree-opt --pass-pipeline="builtin.module(aie.device(amdaie-create-pathfinder-flows{route-ctrl=true route-data=false},amdaie-create-pathfinder-flows{route-ctrl=false route-data=true}))" --split-input-file %s | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename the file to unit_ctrl_and_data_routing.mlir.

Port{packetRulesOp.getSourceBundle(),
packetRulesOp.getSourceChannel()},
PhysPort::Direction::SRC};
if (slaveGroups.count(sourcePhyPort))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait isn't slaveGroups a std::vector<std::vector<PhysPortAndID>>? What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #1087, the data type is now

using SlaveGroupsT = std::map<PhysPort, SmallVector<std::set<uint32_t>>>;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks, I still had an older version locally.

// Infer the connections between source and destination ports, by matching
// the amsel.
for (const auto &[amsel, srcPhysPortAndIDs] : amselToSrcPhysPortAndIDs) {
auto &destPhysPorts = amselToDestPhysPorts[amsel];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto &destPhysPorts = amselToDestPhysPorts[amsel];
SmallVector<PhysPort> &destPhysPorts = amselToDestPhysPorts[amsel];

Comment on lines 284 to 285
for (const auto &srcPhysPortAndID : srcPhysPortAndIDs) {
for (const auto &destPhysPort : destPhysPorts) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use explicit types.

// `packet_rules` and `masterset` operations.
PacketFlowMapT priorPacketFlows;
SmallVector<PhysPortAndID> priorSlavePorts;
auto result = getRoutedPacketFlows(device, deviceModel);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto result = getRoutedPacketFlows(device, deviceModel);
FailureOr<std::tuple<PacketFlowMapT, SmallVector<PhysPortAndID>>> result = getRoutedPacketFlows(device, deviceModel);

void initialize(const AMDAIEDeviceModel &targetModel);
void addFlow(TileLoc srcCoords, Port srcPort, TileLoc dstCoords, Port dstPort,
bool isPacketFlow);
bool addFixedCircuitConnection(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably have this operate on PhysPort and avoid passing col/row, similar to addFixedPacketConnection? Not necessarily for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I will leave it to a different PR.

Another thing, though it might not be a priority now. I think the data structures inside the router could use some cleanup. For example, PathEndPoint is basically the same as PhysPort.

struct PathEndPoint {
TileLoc tileLoc;
Port port;
PathEndPoint() = default;
PathEndPoint(int col, int row, Port port) : PathEndPoint({col, row}, port) {}
PathEndPoint(TileLoc tileLoc, Port port) : tileLoc(tileLoc), port(port) {}
using TupleType = std::tuple<TileLoc, Port>;
PathEndPoint(TupleType t) : PathEndPoint(std::get<0>(t), std::get<1>(t)) {}
operator TupleType() const { return {tileLoc, port}; }
TUPLE_LIKE_STRUCT_RELATIONAL_OPS(PathEndPoint)
};

struct PhysPort {
enum Direction { SRC, DST };
TileLoc tileLoc;
Port port;
Direction direction;
PhysPort(TileLoc t, Port p, Direction direction)
: tileLoc(t), port(p), direction(direction) {}
using TupleType = std::tuple<TileLoc, Port, Direction>;
PhysPort(TupleType t)
: PhysPort(std::get<0>(t), std::get<1>(t), std::get<2>(t)) {}
operator TupleType() const { return {tileLoc, port, direction}; }
TUPLE_LIKE_STRUCT_RELATIONAL_OPS(PhysPort)
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense.

@Yu-Zhewen Yu-Zhewen force-pushed the zhewen_priority_routing branch from 1fe1ffa to 13eb2f1 Compare February 12, 2025 16:09
Copy link
Collaborator

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more nits, but otherwise this looks good to me, thanks!

@Yu-Zhewen Yu-Zhewen enabled auto-merge (squash) February 12, 2025 16:46
@Yu-Zhewen Yu-Zhewen merged commit 8505ad7 into main Feb 12, 2025
7 checks passed
@Yu-Zhewen Yu-Zhewen deleted the zhewen_priority_routing branch February 12, 2025 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants